Result Page Generation for Web Searching by Mostafa Alli;
Author:Mostafa Alli;
Language: eng
Format: epub, pdf
Publisher: IGI Global
An earlier study (Teufel, S. and Moens, M 2002) showed that summarizing scientific papers needs a specific summarization method and other existing methods may not be a good fit. We showed in section [] that why an extractive summarization is a good fit for such aim. In this section, we ran an empirical study against Google scholar in terms of its ranking policy when it is highly based on their citation scores versus when they are ranked based on their textual similarities.
EMPIRICAL STUDY
In this study, we investigated ranking behavior of Google Scholar based on citation scores (by default) and based on textual similarity of papers. To produce similarity of papers, we extracted textual content of papers and applied stop-word removal to clean the plain text. We then counted the most frequent keyword of this bag of words and extracted textual portion of a paper within the occurrence of this most frequent keyword as a normal version of summary of a paper. In addition, we applied stemming after applying stop-word removal to group similar words under a root word as a family word. Afterwards, we counted frequency of words and selected most frequent keyword of a paper and produced summary of a paper based on same procedure for normal keywords. We used these two different versions of summaries in case if they result into different behavior.
Evaluation Metric
For the aim of this study, we evaluated Google scholarâs ranking behavior based on normality and regression curve estimation tests.
Procedure
In order to evaluate ranking policy of Google scholar and observe effect of our proposal, we decided to select 8 random papers as input for Google scholar. Afterwards, we stored papers that Google scholar returned for each input paper for first 4 pages. We also stored citation scores of corresponding papers and their ranks in the Google scholar listing. On the other hand, we produced summaries of same papers based on aforementioned policies for both normal and stemmed keywords. Accordingly, we re-ranked Google scholar listings based on these two types of similarity and analyzed them based on evaluation metrics.
Similarities are produced based on cosine similarity which works in vector space. Common formula for cosine similarity can be illustrated as following:
(1)
Where and are the two summarized version of two given papers. We used these similarities between candidate papers and input paper in order to generate a ranking list of similar papers.
Here, each of these vectors is representing a collection of words from each document that we want to make the similarity comparison.
Result
Accuracy and Distinctiveness of Similarity Values
In this section, we evaluated the similarity values produced based on our proposed mechanism. In first step, we applied sample t-test to find out if either similarity procedure produced significantly different sets of similarity values. According to Pvalue for all cases (0.0<Pvalue<0.03), we can conclude that similarity values produced by normal keywords are significantly different from those made by stemmed keywords.
For next step, we measured mean value of similarities of papers produced by either summary type. According to results, normal keywords would produce higher similarity values of papers (19.
Download
Result Page Generation for Web Searching by Mostafa Alli;.pdf
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(7799)
Grails in Action by Glen Smith Peter Ledbrook(7711)
Configuring Windows Server Hybrid Advanced Services Exam Ref AZ-801 by Chris Gill(6725)
Azure Containers Explained by Wesley Haakman & Richard Hooper(6724)
Running Windows Containers on AWS by Marcio Morales(6238)
Kotlin in Action by Dmitry Jemerov(5080)
Microsoft 365 Identity and Services Exam Guide MS-100 by Aaron Guilmette(4998)
Combating Crime on the Dark Web by Nearchos Nearchou(4584)
Microsoft Cybersecurity Architect Exam Ref SC-100 by Dwayne Natwick(4500)
Management Strategies for the Cloud Revolution: How Cloud Computing Is Transforming Business and Why You Can't Afford to Be Left Behind by Charles Babcock(4430)
The Ruby Workshop by Akshat Paul Peter Philips Dániel Szabó and Cheyne Wallace(4257)
The Age of Surveillance Capitalism by Shoshana Zuboff(3968)
Python for Security and Networking - Third Edition by José Manuel Ortega(3826)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3521)
The Ultimate Docker Container Book by Schenker Gabriel N.;(3496)
Learn Wireshark by Lisa Bock(3412)
Mastering Python for Networking and Security by José Manuel Ortega(3366)
Mastering Azure Security by Mustafa Toroman and Tom Janetscheck(3341)
Blockchain Basics by Daniel Drescher(3312)
